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Abstract 

In [Chen, D., Owen, Ann. Stat., 39, 673-701, 2011] Markov chain 
Monte Carlo (MCMC) was studied under the assumption that the 
driver sequence is a deterministic sequence rather than independent 
U(0, 1) random variables. Therein it was shown that as long as the 
driver sequence is completely uniformly distributed, the Markov chain 
consistently samples the target distribution. The present work extends 
these results by providing bounds on the convergence rate of the dis- 
crepancy between the empirical distribution of the Markov chain and 
the target distribution, under the assumption that the Markov chain 
is uniformly ergodic. 

In a general setting we show the existence of driver sequences for 
which the discrepancy of the Markov chain from the target distribution 
with respect to certain test sets converges with (almost) the usual 
Monte Carlo rate of n -1 / 2 . 
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1 Introduction 



Markov chain Monte Carlo (MCMC) algorithms are used for the approxima- 
tion of an expected value with respect to the stationary probability measure 7r 
of the chain. This is done by simulating a Markov chain (Xj) «>i and using the 
sample average ~Y17=if(Xi) to estimate the mean E 7r (/) := J G f(x)7r(dx), 
where G is the state space and / is a real- valued function defined on G. This 
method is a staple tool in the physical sciences and Bayesian statistics. 

A single transition from X^_i to of a Markov chain is generated by 
using the current state and a random source Ui, usually taken from an 
i.i.d. U(0, 1) sequence {Ui)i>\ of random numbers. In contrast, the Markov 
chain quasi-Monte Carlo idea is as follows: Substitute the sequence of ran- 
dom numbers by a deterministically constructed finite sequence of numbers 
(wj)i<j< n in [0, l] s for all n e N. Numerical experiments suggest that for judi- 
ciously chosen deterministic pseudo-random numbers (wi)i<i< n this can lead 
to significant improvements. Owen and Tribble [25] and Tribble [M] report 
an improvement by a factor of up to 10 3 and a faster convergence rate for a 
Gibbs sampling problem. There were also previous attempts which provided 
evidence that the approach leads to comparable results [161 113 E2] • Another 
line of research, dealing with the so-called array-RQMC method, also com- 
bines MCMC with quasi-Monte Carlo [15]. For a thorough literature review 
we refer to [61 Subsection 1.1 (Literature review)]. 

Recently in the work of Chen, Dick and Owen [6] and Chen [5], the first 
theoretical justification of the Markov chain quasi-Monte Carlo approach on 
continuous state spaces was provided. Therein a consistency result is proven 
if the random sequence (£/j)j>i is substituted by a deterministic 'completely 
uniformly distributed' (CUD) sequence (w;)i>i and the integrand / is con- 
tinuous. For a precise definition of CUD sequences we refer to [6] [7] and for 
the construction of weakly CUD sequences we refer to [35] . The consistency 
result of Markov chain quasi-Monte Carlo corresponds to an ergodic theorem 
for Markov chain Monte Carlo. However, from the result in [6] it is not clear 
how fast the sample average converges to the desired expectation. The goal 
of this paper is to investigate the convergence behavior of such Markov chain 
quasi-Monte Carlo algorithms. We describe the setting and main results in 
the following. 

Throughout the paper we deal with uniformly ergodic Markov chains on 
a state space G C R d and a probability space (G, B(G), ir), where B{G) 
is the Borel a-algebra defined on G and ir is the stationary distribution 
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of the Markov chain, for details see for example [201 ESI ES]- We assume 
that the Markov chain can be generated by an update function ip : G X 
[0, l] s — > G, that is, Xi = <£>(X;_i; Ui) for all i > 1. We fix a starting 
point xq = x and replace the random numbers (Ui)i>\ by a deterministic 
sequence (uj)i>i to generate the deterministic points Xi = (p(xi-i]Ui) for 
% > 1. The convergence behavior of the Markov chain is measured using 
a generalized Kolmogorov-Smirnov test between the stationary distribution 
Ti and the empirical distribution n n (A) := - Y^h=i ^eA, where l Xi &A is the 
indicator function of the set A £ 13(G). The discrepancy is defined by taking 
the supremum of \7t(A) — 7~f n (A) | over all sets in stf C £>(G) (since the empirical 
distribution is based on a finite number of points in G we generally have stf ^ 
B(G), see below for a more detailed description). Under these assumptions 
we prove that, for each n £ N, there exists a finite sequence of numbers 
(^i)i<i<n such that this discrepancy converges with order C(n~ 1/ ' 2 (logn) 1 / 2 ) 
as n tends to infinity This is roughly the convergence rate which one would 
expect from MCMC algorithms based on random inputs. 

A drawback of our results is that we are currently not able to give explicit 
constructions of sequences (wi)i<i< n for which our discrepancy bounds hold. 
This is because our proofs make essential use of probabilistic arguments. 
Namely, we use a Hoeffding inequality by Glynn and Ormoneit [10] and some 
results by Talagrand [33] on empirical processes and a result by Haussler [T2] . 
Roughly speaking, we us the Hoeffding inequality to show that the probability 
of all {Xi) i<i< n with small discrepancy is bigger than 0, which implies the 
existence of a Markov chain with small discrepancy. We do, however, give a 
criterion (which we call 'push-back discrepancy') which the numbers (Mj)i<j< n 
need to satisfy such that the point set (xi)i<i< n has small discrepancy. This 
is done by showing that the discrepancy of (xi)i<i< n is close to the push- 
back discrepancy of the driver sequence ("Ui)i<i< n - This should eventually 
lead to explicit constructions of suitable driver sequences. As a corollary to 
the relation between the discrepancy of the Markov chain and the push-back 
discrepancy of the driver sequence, we obtain a Koksma-Hlawka inequality 
for Markov chains in terms of the discrepancy of the driver sequence. We 
point out that the push-back discrepancy generally differs from the CUD 
property studied in [5] and [6]. Convergence rates beyond the usual Monte 
Carlo rate of n~ 1 / 2 have previously been shown for Array-RQMC [15] and in 
[51 Chapter 6]. In both of these instances, a direct simulation is (at least in 
principle) possible. 

Our results on the discrepancy of the points (xj)i<j< n can also be under- 
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stood as an extension of results on point distributions in the unit cube [0, 1} S , 
see [13], to uniformly ergodic Markov chains. 

We give a brief outline of our work. In the next section we provide 
background information on uniformly ergodic Markov chains, give a relation 
between the transition kernel of a Markov chain and their update function 
and state some examples which satisfy the convergence properties. We also 
give some background on discrepancy and describe our results in more detail. 
In Section [3] we provide the notion of discrepancy with respect to the driver 
sequence and we prove the close relation between the two types of discrepancy 
for uniformly ergodic Markov chains from which we deduce a Koksma-Hlawka 
type inequality. In Section H] we prove the main results. The appendix 
contains sections on 5-covers, the integration error and some technical proofs. 

2 Background and notation 

In this section we provide the necessary background on discrepancy and 
uniformly ergodic Markov chains. 

2.1 Discrepancy 

The convergence behavior of the Markov chain is analyzed with respect to 
a distance measure between the empirical distribution of the Markov chain 
and its stationary distribution it. It can be viewed as an extension of the 
Kolmogorov-Smirnov test and is a well established concept in numerical anal- 
ysis and number theory [8] . We analyze the empirical distribution of the first 
n points of the Markov chain X±, . . . ,X n by assigning each point the same 
weight and defining the empirical measure of a set A £ B{G) by 

1 - 
n ^— ' 

i=l 

where the indicator function is given by 

f 1 if Xi £ A, 
Xl£A \ otherwise. 

The local discrepancy between the empirical distribution and the stationary 
distribution is then 

A nA = 7T n (A) -it (A). 
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To obtain a measure for the discrepancy we take the supremum of |A n>J 4| 
over certain sets A. Note that since the empirical measure uses only a finite 
number of points the local discrepancy A H) a does not converge to in general 
if we take the supremum over all sets in 13(G). Thus we restrict the supremum 
to a set of so-called test sets srf C 13(G). Now we define the discrepancy 

Definition 1 (Discrepancy) The discrepancy of P n = {X\,...,X n } C G 

is given by 

D^ ^Pn) = sup |A n|A |- 

This is the measure which we use to analyze the convergence behavior of the 
Markov chain as n goes to oo. 

In Appendix[B]we provide a relationship between the discrepancy D*^ (P n ] 
and the integration error of functions in a certain function space Hi, where 
the set of test sets is given by 

s/ = {(-oo,x) G : x E R d }, 

with R d = (MU{oo,-oo}) d and (-oo,x) G := (-oo,x)C\G = Yl* =1 (-oo,€j)r\ 
G for x = (£1, . . . , In particular, if there is at least one i with ^ = — oo, 
then (— oo,x)g = 0, whereas if all & = oo, then (— oo,x)g = G C M. d . For 
functions / G Hi we have 



1 - 

K(f)--J2f(X 



n 

i=l 



<D*,(P n )\\f\\ Hl . 



Inequalities of this form are called Koksma-Hlawka inequalities, see [H Chap- 
ter 2] for more information. See Appendix [B] for details on the definition of 
the space Hi and the proof of the inequality. 



2.2 Markov chains 

The main assumption on the Markov chain in [5] and [6] is the existence 
of a coupling region, or in a weakened version, a contraction assumption 
on the update function. Roughly speaking, this means that if one starts 
two Markov chains at different starting points but uses the same random 
numbers as updates, then the points of the chain coincide or move closer to 
each other as the chain progresses. In this paper, we replace this assumption 
by the assumption that the Markov chain is uniformly ergodic. The concept 
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of uniform ergodicity is much closer to the concept of discrepancy, which 
allows us to obtain stronger results than previous attempts. We introduce 
uniformly ergodic Markov chains in the following. 

Let GCR' i and let B(G) denote the Borel cx-algebra of G. In the following 
we provide the definition of a transition kernel. 

Definition 2 The function K: G x B{G) — > [0, 1] is called transition kernel 

(i) for each x G G the mapping A G B(G) >->■ K(x,A) is a probability 
measure on (G,B(G)), and 

(ii) for each A G B(G) the mapping x GG4 K(x, A) is a B(G) -measurable 
real-valued function. 

Let K : G x B{G) — > [0, 1] be a transition kernel. We assume that tt is 
the unique stationary distribution of the transition kernel K, i.e. 

/ K(x, A)7i(dx) = ir (A), WA G B{G). 

JG 

The transition kernel K gives rise to a Markov chain X , X\, X 2 , . . . G G in 
the following way. Let Xq = x with x G G and i G N. Then, for a given 
Xi-i, we choose X« with distribution K(Xi-\, •), that is, for all A G 13(G), 
the probability that Xj G A is given by K(Xi_\, A). 

Definition 3 (Total variation distance) The total variation distance be- 
tween the transition kernel K(x, ■) and the stationary distribution ir is defined 
by 

\\K j (x, ■) - 7r|| = sup \K j (x, A) - tt(A) \ . 

v AeB(G) 

Note that with K°(x, A) = 1 x€ a w e have 

K'{x,A)= [ K(y,A)K j -\x,dy)= [ K j ~ l (y, A) K(x, dy). 

JG JG 

Definition 4 (Uniform ergodicity) Let a G [0,1) and M G (0, oo). The 
transition kernel K is uniformly ergodic with (a,M) iff for any x G G and 
j G N we have 

\\KHx,-) - vrll <a j M. 

II v ' ' 1 1 tv — 

A Markov chain with transition kernel K is called uniformly ergodic if there 
exists an a G [0, 1) and M G (0, oo), such that the transition kernel is uni- 
formly ergodic with (a,M). 
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Remark 1 The uniform ergodicity is used in the estimates of the discrepancy 
and is a necessary condition for the Hoeffding inequality ITUf . For Markov 
chains which satisfy weaker convergence properties, for definitions see [27[ 
[2ty one must use other concentration inequalities. The papers fJ\ [HJ \2J$ 
might be useful to get similar results in other settings. 

Let us state a result which provides an equivalent statement to uniform 
ergodicity. Let be the set of all bounded functions /: G — > R. Then 
define the operator P J : — > by 

P j f(x)= [ f{y)K*(x,dy), 
Jg 

and the expectation with respect to it is denoted by E^if) = f G f(y)ii(dx). 
The following result is well known, for a proof of this fact see for example 
[301 Proposition 3.23. p. 48]. 

Proposition 1 Let a £ [0,1) and M £ (0,oo). Then the following state- 
ments are equivalent: 

(i) The transition kernel K is uniformly ergodic with (a,M). 

(ii) The operator P J — satisfies 

II^'-EJL . <2Ma j , neN. 

II II L/ryO yj-'OO 

In the following we introduce update functions tp for a given transition 
kernel. 

Definition 5 (Update function) Let <p : G x [0, l] s — > G be a measurable 
function and 

B:Gx B{G) ^i3([0,l] s ), 

B(x, A) = {u £ [0, l] s : tp(x; u) £ A}, 

where B([0, 1} S ) is the Borel o-algebra of [0, l] s . Let X s denote the Lebesgue 
measure on R s . Then the function tp is an update function for the transition 
kernel K iff 

K(x, A) = F(<p(x; U) £ A) = X s {B{x, A)), (1) 
where P is the probability measure for the uniform distribution in [0, l] s . 
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Example 1 (Direct simulation) Let us assume that we can sample with re- 
spect to tt, i.e. K(x, A) = it (A) for all x G G. For the moment let G = [0, 1} S 
and let tt be the uniform distribution on G. In this case we can choose the 
simple update function <p(x; u) = u, since then 

n (A) = X s (B(x, A)) for all xeG. 

If G is a general subset ofM. d and tt is a general probability measure, then we 
need a generator, see JMj. A generator is a special update function i/j: [0, l] s — ?• 
G such that 

tt(A) = F(iP(U) G A), for all A G B(G). 

Note that the transition kernel K(x, A) = tt(A) is uniformly ergodic with 
(a, M) fora = and M G (0, oo). 

Example 2 (Hit-and-run algorithm) Let G C M. d be a compact convex body 
and tt be the uniform distribution on G. Let S^ 1 = {x G M. d : \\x\\ 2 = 
(x, x) 1 / 2 = 1} be the d — 1-dimensional sphere, where (x, y) denotes the stan- 
dard inner product in M. d . Let 9 G S d_1 and let L(x, 9) be the chord in G 
through x and x + 9, i.e. 

L(x, 9) = {x + s9 G R d | s G M} n G. 

We assume that we have an oracle which gives us a(x,9),b(x,9) G G, such 
that 

[a(x,9),b(x,9)} = L(x,9), 

where [a(x , 9) , b(x , 9)] = {\a(x,9) + (1 - X)b(x,9): X G [0,1]}. A transi- 
tion of the hit and run algorithm works as follows. First, choose a ran- 
dom direction 9. Then we sample the next state on [a(x , 9) , b(x , 9)} uni- 
formly. Let ip: [0, — > G be a generator for the uniform distribution 
on the sphere, see for instance JPJ/. Then we can choose for x G G and 
u = (vi, v 2 , ■ ■ ■ , Vd) G [0, l] d the update function 

ip(x, u) = v d a(x, ^{vx v d -i)) + (1 - v d ) b(x, ^{v x , . . . , i^-i)). 

In I31\j it is shown that there exists an a G [0, 1) and an M G (0, oo), such 
that the hit-and-run algorithm is uniformly ergodic with (a, M). 
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Example 3 (Independence Metropolis sampler) Let G C M. d be bounded and 
it be a probability measure on G with possibly non-normalized density function 
p: G — > (0, oo), i.e. 



7r(A) 



A g B(G). 



J G p(x) dx 1 



Let us assume that we have a generator i/j: [0, l] s — > G for the uniform 
distribution on G. Let 



be the acceptance probability of the Metropolis transition. Then we can choose 
for x G G and u = (vi, v 2 , . . . , v s+ i) G [0, the update function 



In IW[ Theorem 2.1., p. 105] a sufficient criterion for uniform ergodicity of 
the independence Metropolis algorithm is provided. A local proposal Metropo- 
lis algorithm can also be uniformly ergodic, see for example JTffl . 

Let us briefly add some more examples. The slice sampler, for details with 
respect to the algorithm and update functions see [23], is under additional 
assumptions uniformly ergodic, see [22]. Furthermore, the Gibbs sampler for 
sampling the uniform distribution is uniformly ergodic if the boundary of G 
is smooth enough, see [28J. 

Above we defined the set B(x, A), which is for x G G and A G 13(G) 
the set of random numbers u which takes x into the set A using the update 
function tp with arguments x and u. We now define sets of random numbers 
which take x to A in % G N steps. Let <fi(x; u) = <p(x; u) and for % > 1 let 



that is, ipi(x; Ui, u 2 , ■ ■ ■ , Uj) G G is the point obtained via i updates using 
Mi, u 2 , ■ ■ ■ , Ui G [0, l] s where the starting point is x G G. 





(pt-.Gx [0, l] iS -> G, 
ifii(x; Mi, u 2 ,..., = V9(v?i_i(x; u x , u 2 , . . . , w^-i); u { ), 
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Lemma 1 Let i,j £ N and i > j. For any u\, . . . , Ui £ [0, 1} S and x £ G we 

have 

ipi(x; ux, . . . , Uj) = (pi-j(tpj(x; u u ...,Uj)] u j+1: ...,Ui). (2) 

Proof. The assertion can be proven by induction over z. □ 

For i > 1 let 

Bi : G x 13(G) ^B([0,l] is ), 
Bt(x, A) = {(iti, ii 2 , . . . , Mj) £ [0, : v?i(a;; u 1} u 2 , • • • , «i) ^ A}. 

We therefore have Si(x,A) = Note that S<(x,A) C [0, The 

next lemma is important to understand the relation between the update 
function and the transition kernel. 

Lemma 2 Let (p be an update function for the transition kernel K . Let 
n £ N and F: G n —> BL The expectation with respect to the joint distribution 
of X\, . . . , X n from the Markov chain starting at xo E G is given by 

E XOtK (F(Xx, . . . ,X n )) = / ... / F(xi, . . . ,x n ) K(x n -i,dx n ) . . .K(x ,dxi). 

Jg Jg 



Then 

E XQj k(F(Xi, . . . , X n )) 

F(tpi(x , ui), . . . , ip n (x , iti, . . . , u n ))dui . . . du n , 



(3) 



'[o,i] ns 

whenever one of the integrals exist. 



Note that the right-hand-side of ([3]) is the expectation with respect to the 
uniform distribution in [0, l] ns . 

Proof. First note that by the definition of the update function we obtain for 
any 7r-integrable function / : G — >■ M that 



f(y)K(x,dy)= / f(<p(x,u))du. (4) 

G J[0^\ s 
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By the application of Lemma [T] and (J3J) we obtain 

F(<^i(x , Mi), ... , (p n {Xo, Ml, • • • , M n ))dUi . . . du n 



'[O.l]" 8 

/ F(X1, V?l(Xi,M 2 ), . . . , y?n-l(>l,U2, . . . ,M n ))x 

[0,l] (n " 1)s 

K(xo, dxi)du 2 . . . du n . 

The iteration of this procedure leads to the assertion. □ 

Corollary 1 Let </? be an update function for the transition kernel K and let 
it be the stationary distribution of K. For any i G N and A G B(G) we have 
K l (x,A) = \ is (Bi(x, A)). In particular 

\ iB (Bi(x, A)) n(dx) = tt(A). 

G 

Proof. Set for n > i 

F(xi, ...,x n ) = l Xie A- 

Then by Lemma[2]we obtain K l (x, A) = X is (Bi(x, A)) and by the stationarity 
of Ti the proof is complete. □ 



3 On the discrepancies of the Markov chain 
and driver sequence 

Recall that the star-discrepancy of a point set P n = {x±, x 2 , . . . , x n } C G 
with respect to the distribution n is given by 



D l/A p n) = sup 



1 n 
n ' 

i=l 



Let us assume that u\,U2, ■ ■ ■ ,u n G [0, l] s is a finite deterministic se- 
quence. We call this finite sequence driver sequence. Then let the set 
P n = {xi, x 2 , . . . , x n } C G be given by 



Xi = Xi(x ) = (p(xi-i] Mi) = <^i(a;o; Mi, 



n. 



(5) 



We now define a discrepancy measure on the driver sequence. Below we 
show how this discrepancy is related to the discrepancy of the Markov chain. 
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Definition 6 (Push-back discrepancy) LetU n = (u%,U2, ■ ■ ■ ,u n ) G [0, 1] 
and let B^ be defined as above. Define the local discrepancy function by 



1 

A£X<pfa «1> • • • > U n) = ~ [ 1 (ui,...,u l )eB t (x,A) ~ Ks(Bi(x, A))] 



Let C 13(G) be a set of test sets. Then we define the discrepancy of the 
driver sequence by 

D *#W = sup | Ajj^ (a;; u 1 ,...,u n )\. 

We call (U n ) the push-back discrepancy. 

The discrepancy of the driver sequence (U n ) is a 'push-back dis- 
crepancy' since the test sets Bi(x,A) are derived from the test sets A G srf 
from the discrepancy of the Markov chain D*^ n (P n ) via inverting the update 
function. 

The following theorem provides an estimate of the star-discrepancy of P n 
with respect to properties of the driver sequence and the transition kernel. 

Theorem 1 Let K be a transition kernel defined on G C M. d with stationary 
distribution ix. Let <p be an update function for K. Let Xq — X and let 
Ui,u 2 , ■ ■ ■ ,u n G [0, 1} S be the driver sequence, such that P n is given by (JSJ). 
Let srf C 13(G) be a set of test sets. Then 



\D^JP n )-D^(U n )\ < sup 



1 n 

-ViT(*,A)-7T(A) 

n 



Proof. For any A G srf we have 
1 n 

i=i 

= - it, [kuu-^B^A) ~ K'(x, A) + K*{x, A) - Ti(A)} 
i=i 

- X) [ 1 («i.--.«06BiOM) ~ A ia (5i(a;,A))] + -J^K^x, A) -ir(A) 



i=i 



i=l 
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Note that we used Xi S (Bi(x, A)) = K l (x,A) which follows from Corollary [TJ 
Hence 



The inequality 



sup 



1 

-J2K\x,A)-7r(A) 



D* <lp (U n ) < D: <n (P n ) + sup 
follows by the same arguments. 



i=i 



1 A 



-> K\x,A)-n(A) 



n 



i=l 



□ 



Corollary 2 Let us assume that the conditions of Theorem U\ o,re satisfied. 
Further let a G [0, 1) and M G (0, oo) and assume that the transition kernel 
is uniformly ergodic with (a, M) . Then 



< 



aM 



n(l — a) 

Proof. By the uniform ergodicity with (a, M) we obtain 

1 n i n 

-Y,K i (x,A)-n(A) <-Y J \K\x,A)-n(A) 

i=i 

lA aM 



i=l 



< 



a 3 M 



nil 



Then by Theorem [T] the assertion is proven. 



□ 



Remark 2 In the setting of Example^ where we assumed that G = [0, l] s 
and K(x, A) = ir(A) we obtain that a = 0. In this case we get the well studied 
star-discrepancy for the uniform distribution on [0, 1} S , see for instance /2}/. 

Theorem [1] gives an estimate of the star-discrepancy in terms of the discrep- 
ancy of the driver sequence and a quantity which depends on the transition 
kernel. We have seen in the previous corollary that for uniformly ergodic 
Markov chains we can estimate this quantity. Since our bounds on the 
discrepancy D*^ w (P n ) are of order C(n~ 1 / 2 (logn) 1 / 2 ), by Corollary [21 the 
push-back discrepancy of the driver sequence satisfies the same convergence 
order. 

From Corollary [2] and Theorem [3] in Appendix |B] we now obtain the 
following Koksma-Hlawka inequality (cf. [SI Proposition 2.18]). 
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Corollary 3 (Koksma-Hlawka inequality for uniformly ergodic Markov chains) 

Let us assume that the conditions of Theorem U\ are satisfied. Further let 
a G [0, 1) and M G (0, oo) and assume that the transition kernel is uniformly 
ergodic with (a, M). Let H\ denote the space of functions f : G — > C defined 
in Appendix^ Then for all f G Hi we have 



r i n 

Jg 



aM 



nil — a] 



Hi- 



Again, in the setting of Example [T] for direct simulation we have a 
and we obtain the Koksma-Hlawka inequality 



r i n 

Jg n£f 



< D^jU n )\\f\\Hi- 



4 On the existence of good driver sequences 

In this section we show the existence of finite sequences U n = (ui, U2, ■ ■ ■ , u n ) G 
[0, l} ns such that 

D^lQ and D*^{P n ) 

converges to if the transition kernel is uniformly ergodic and P n is given 
by ([5]). The main result is proven for D^ n (P n ). The result with respect to 
D^ ^Un) holds by Theorem GQ 

The concept of a 5-cover will be useful (cf. [UJ for a discussion of ^-covers, 
bracketing numbers and Vapnik-Cervonenkis dimension). 

Definition 7 Let stf C B[G) be a set of test sets. A finite subset T$ C 
is called a 5-cover of with respect to it if for every A G £/ there are sets 
C,D ETs such that 

C C A C D 

and 

n(D \C)<5. 
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Remark 3 The concept of a 5 -cover is motivated by the following result. Let 
us assume that T$ is a 5-cover of s$ . Then, for all {z±, . . . , z n }, the following 
discrepancy inequality holds 



sup 

A^srf 



i " 1 n 

- V] hteA ~ tt(A) < max - V l Zt£C - tt(C) 

i=l i=l 



6. 



Proof. Let A e &t and B C A CC be such that n(C \B)<6. Then 

- UeA - <A) <-J2 W - tt(CT) + 5 
n z — ' n z — ' 

i=l i=l 

and 

- £ 1^ - tt(A) > - £ l, i6S - tt(S) - 5. 



1=1 



i=l 



Thus the result follows. 



□ 



Let us introduce the notation A^a,^^ = An^ A Jx; u±, . . . ,u n ) and note 



that 



1 n 

= ^E l 1 (n 1 ,...,u i )eB i (x,A) - vr 



i=l 



(6) 



i=l 



Lemma 3 Let K be a transition kernel with stationary distribution n. Let tp 
be an update function of K . Let X\,X2, . . . , X n be given by a Markov chain 
with transition kernel K and Xq = x. Then for any A G B(G) and c > we 
obtain 



|A 



n,A,tp,x 



> c] = P. 



x,K 



1 

- E " 7T(A) 

i=l 



> C 



(7) 



where P zs i/ie probability measure for the uniform distribution in [0, l] ns and 
P x k is the joint probability of Xi, . . . , X n with X = x. 
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Proof. Let 

J(A,c)={{z 1 ,...,z n )eG n : 

Set 

F(xi, ■■■ ,X n ) = l( Xl ,...,x n )&J(A,c) 



1 n ] 

-J2l Zi eA-K(A) >c . 

i=l J 



1 |^Er=iWA-7r(A)|> C , 



otherwise. 



By 

E^(F(X l5 ...,I n )) = J(A, c)), 
Lemma H] and ([H]) the assertion is proven. 



□ 



The following result from [10] gives us a Hoeffding inequality for uniformly 
ergodic Markov chains. 

Proposition 2 (Hoeffding inequality for uniformly ergodic Markov chains) 

Assume that the transition kernel K is uniformly ergodic with (a,M). Let 
X\, X2, . . . , X n be given by a Markov chain with transition kernel K and 
X = x. Then for any A G B(G) and c > we obtain 



n 



Y,^x^A-7r(A) 



where n > 



i=l 

4M 
(l-a)c ' 



> C 



< 2exp 



'1-aY nc- 



2M ^2' 
1— a ' 



M 2 



8n 



Proof. Note that the first inequality in the proof of [TOl Theorem 2] with our 
notation and f c — 1a — k(A) is given by 



\P n fc{x)\ < \\P n ~^\ 



Log ^^00 



By Proposition [T] we obtain \P n f c (x)\ < 2Ma n . Then the conclusion follows 
by the same steps as the proof of [TQl Theorem 2] . □ 
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4.1 Monte Carlo rate of convergence 

We now show that for every starting point xq and every n there exists a finite 
sequence u\, Uz, . . . , u n G [0, l] s such that the discrepancy of the correspond- 
ing Markov chain converges approximately with order n -1 / 2 . The main idea 
to prove the existence result is to use probabilistic arguments. We apply a 
Hoeffding inequality for Markov chains to the local discrepancy function for 
a fixed test set to show that the probability of point sets with small local 
discrepancy is large. We then extend this result to the local discrepancy 
for all sets in the 5-cover and finally to all test sets. Using Corollary [2] we 
are also able to obtain a result for the push-back discrepancy of the driver 
sequence. The result shows that if the finite driver sequence is chosen at 
random from the uniform distribution, most choices satisfy the Monte Carlo 
rate of convergence of the discrepancy for the induced point set P n . 

Theorem 2 Let K be a transition kernel with stationary distribution it de- 
fined on a set G C ]R d . Assume that the transition kernel is uniformly ergodic 
with (a, M). Let srf C B(G) be a set of test sets. Assume that for every 5 > 
there exists a set r$ C B(G) with \Ts\ < oo such that T$ is a 5 -cover of with 
respect to it. Let ip be an update function for K . Then, for any xq = x there 
exists a driver sequence u±, U2, ■ ■ ■ ,u n G [0, l] s such that P n = {x%, . . . , x n } 
given by 

Xi = Xi(x ) = (fixi-i, Ui) = (fi(x ; Mi, . . . , Ui), i = l,...,n 

satisfies 



1 — a \/n 



Proof. Let A e stf and xq = x G G. By Lemma [3] and Proposition [2] we 



obtain for any c n > n ^f_ a ^ that 

PflA^I < c n ] > 1 - 2exp f ~ K - 1 ^ L - ] • (9) 

Let 

f s = {D\C:CCACD,C,De T s }. 
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Set m = \Tg\- If we have for all AeTj that 

F[\A X)nA J <c n ] > 1--, 

m 

then there exists a finite sequence U\, . . . , u n £ [0, l] s such that 



By (jHJ) we obtain for 



max |Aa; |?ll A,vj| < c n - 



4M y/2 log(2 m) 
1 — a \fn 



(10) 



(11) 



that f fTOj) holds and we get the desired result for any A 6 Tj. Now we 
extend the result from I" '$ to srf . For A £ there are C, -D £ 1^ such that 
CCACD and 7r(.D \ C) < 5, since 1^ is a 5-cover. Hence we get 

1 

~S [ 1 («i-.,«i)eB t ( a!) A) -n{A)] 



i=l 



< 



^ J] [ 1 («i,...,«i)6B i (*,D) - 7T(^)] - - X [l(«i,...,« i )6B i (*,D\A) ~ 7r(£ \ A)] 
i=l i=l 

1 " 

~ [ 1 (ui....,«i)6Bi(*,f ) - A D )] 

i=l 

1 - 

- Yl [ 1 (ui,..,u l )£B l (x,D\A) ~ K{D \ A)] 
i=l 



Set 



and 



h = 



1 - 

~z2 [ 1 («i....,«i)eB i (*,n) 7r(Z?)J 
i=i 



I9 = 



By D G Tj we have 



1 n 

- Yl [ 1 (u 1 ,...,u l )GB t (x,D\A) ~ TT(D \ A)] 
i=l 



h < max \A n ,A,fA < C n- 
Aef s 
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Furthermore 



< 



1 n 

- l(« 1 ,...,« i )6B i (xMA) " AD \ C) + 7T(D \ C) - 7T(D \ A) 
i=l 

1 " 

- Yl [ 1 (MU..;»i)eB i (x,D\a) ~ k(D \ C)] 



71 



(D\C)-n(D\A)\ 



i=l 

< c n + 8. 

The last inequality follows by the 5-cover property, (jTTI) and the fact that 
D\C G IV Finally note that m — \Tg\ < | | 2 /2 which completes the proof. 
□ 

Using Corollary [2] we can also state Theorem [2] in terms of the driver 
sequence. 

Corollary 4 Let K be a transition kernel with stationary distribution ir de- 
fined on a set G C ]R d . Assume that the transition kernel is uniformly ergodic 
with (a, M). Let si C B[G) be a set of test sets. Assume that for every 5 > 
there exists a set T$ C |r^| < oo such that T$ is a 5-cover of si 

with respect to it. Let ip be an update function for K . Then for any xq = x 
there exists a driver sequence u\, u 2 , ■ ■ ■ , u n G [0, 1} S such that 



sm v/io g |r^ 



Let P„ 



*, V K~,>, - i_ a 
{xi,.. .,x n } given by 



n 



+ 5 + 



aM 



nil — a) 



x,i = Xi(x ) = <p{xi- X \ Ui) = (Pi{x ; ui, 
Then P n satisfies 

8M ^/hg \T S \ 



n. 



a 



5 



2aM 



n(l 



a 



This corollary has two consequences. One is the existence of a driver 
sequence with small push-back discrepancy. The second is that if one can 
construct such a sequence with small push-back discrepancy, then the Markov 
chain which one obtains using this driver sequence also has small discrepancy. 
Thus the push-back discrepancy is a sufficient criterion for the construction 
of good driver sequences. 

Theorem [2] and Corollary H] depend on 5 and the size of the 5-cover IV 
For a certain set of test sets we have the following result. 



19 



Corollary 5 Let K be a transition kernel with stationary distribution it de- 
fined on a se t G C R d . Assume that the transition kernel is uniformly ergodic 
with (a,M). Let the set of test sets stf C B[G) be given by 

si = {(-oo, x) nG | x G R d }, 

where M. d = (R U {oo, — oo}) d . Let ip be an update function for K . Then 
for any x = x there exists a driver sequence Ux,u 2 , ■ ■ ■ ,u n G [0, 1} S and an 
absolute constant c > such that 

8M y/rflog(3 + 4c 2 n) Vd aM 

U st,iA U n) S ~ 7= 1 y= + 



1 — a y/n \fn n(l — a) 
Let P n = {xi, . . . , x n } given by 

Xi = Xi(x ) = (p(xi-i, Ui) = ipi(x ; u h ...,Ui), i = l,...,n. 
Then P n satisfies 

8M Vdlog(3 + ±c?n) y/d , 2aM 

U si,A^n) S Z 7= 1 /= + 



1 — a \/n \/n n(l — a) 

Proof. The result follows by Lemma H] in Appendix |A] which shows the ex- 
istence of 5-covers with 

\Ts\ < (4 + 3c 2 d5- 2 ) d , 
and Corollary HJ □ 

In the next subsection we show that the exponent of —1/2 of n in the 
bound in Theorem |2] cannot be improved in general. 



4.2 Optimality of the Monte Carlo rate 

We now show that the exponent —1/2 of n in Theorem [2] cannot be improved 
in general. We do so by specializing Theorem [2] to the sphere E> d . Recall that 
E> d = {x G R. d+1 : ||x|| 2 = (x, x) 1//2 = 1}, where (x,y) denotes the standard 
inner product in M. d+1 . A spherical cap C(x,t) C S d with center x G E> d and 
— 1 < t < 1 is given by 

C(x,t) = {yeS d :(x,y)>t}. 
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Let C = {C(x, t) : x G § d , — 1 < t < 1} be the test set of spherical caps of 
§ d . The normalized area of a spherical cap C(x,t) for < t < 1 is given by 

v v ' 2 5(1^/2,1/2) ' 
where B is the incomplete beta function 

5(1 -t 2 ;d/2,l/2) = / ^/^(l - z)-V 2 dz. 

io 

Then the spherical cap discrepancy of a point set P n = {xi, X2, . . . ,x n } C S d 
is given by 



D S d d P n) = SU P 

C*eC 



-EW-vr(C) 



n . 



The following result is an application of Theorem [2j For a proof of the 
corollary we refer to Subsection O in the Appendix. 

Corollary 6 There exists an absolute constant c > independent ofn and d 
such that for each n and d there exists a set of points P n = {x±, X2, ■ ■ ■ , x n } C 
E> d such that the spherical cap discrepancy satisfies 

n* (p)<r Vd+V (d + l)\ogn 



n 



We have shown the existence of points on E> d for which the spherical 



cap discrepancy is of order y ^p. Since this result follows by specializing 

Theorem [2] to the sphere, any improvement of the exponent —1/2 of n in 
Theorem [2] would yield an improvement of the exponent of n in Corollary |6j 
However, it is known that the spherical cap discrepancy of any point set 
is at least n~ l l 2 ~ l ^ 2d \ see [3]. Thus, an exponent smaller than —1/2 in 
Corollary [6] would yield a contradiction to the lower bound on the spherical 
cap discrepancy for large enough d. Thus, at this level of generality, the 
exponent of n in Theorem [2] cannot be improved. 

We point out that a bound on the spherical cap discrepancy can also be 
deduced from [131 Theorem 4] by using a bound on the Vapnik-Cervonenkis 
dimension for C. 



21 



Acknowledgements 



J. D. was supported by a Queen Elizabeth 2 Fellowship from the Australian 
Research Council. D. R. was supported by an Australian Research Council 
Discovery Project (F. Kuo and I. Sloan), by the DFG priority program 1324 
and the DFG Research training group 1523. H. Z. was supported by a PhD 
scholarship from the University of New South Wales. 



References 

[1] R. Adamczak, A tail inequality for suprema of unbounded empirical pro- 
cesses with applications to Markov chains, Electron. J. Probab., 13, 
1000-1034, 2008. 

[2] N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc. 
68, 337-404, 1950. 

[3] J. Beck, Sums of distances between points on a sphere an - application 
of the theory of irregularities of distribution to discrete geometry, Math- 
ematika, 31, 33-41, 1984. 

[4] J. Brauchart and J. Dick, A characterization of Sobolev spaces on the 
sphere and an extension of Stolarsky 's invariance principle to arbitrary 
smoothness, submitted, 2012. 

[5] S. Chen, Consistency and convergence rate of Markov chain quasi-Monte 
Carlo with examples, PhD thesis, Stanford University, 2011. 

[6] S. Chen, J. Dick and A. Owen, Consistency of Markov chain quasi- 
Monte Carlo on continuous state spaces, Ann. Statist., 39, 673-701, 
2011. 

[7] S. Chen, M. Matsumoto, T. Nishimura and A. Owen, New inputs and 
methods for Markov chain quasi-Monte Carlo, In Monte Carlo and 
Quasi-Monte Carlo Methods in Scientific Computing, ( L. Plaskota and 
H. Wozniakowski, eds.), 313-327, Springer, New York, 2012. 

[8] J. Dick and F. Pillichshammer, Digital nets and sequences: Discrep- 
ancy Theory and Quasi-Monte Carlo integration, Cambridge University 
Press, Cambridge, 2010. 



22 



K. Fang and Y. Wang, Number-theoretic Methods in Statistics, Chapman 
& Hall, London, 1994. 

P. Glynn and D. Ormoneit, Hoeff ding's inequality for uniformly ergodic 
Markov chains, Statist. Probab. Lett., 56, 143-146, 2002. 

M. Gnewuch, Bracketing numbers for axis-parallel boxes and applications 
to geometric discrepancy, J. Complexity, 24, 154-172, 2008. 

D. Haussler, Sphere packing numbers for subsets of the Boolean n-cube 
with bounded Vapnik-Chervonenkis dimension, J. Combin. Theory Ser. 

A, 69, 217-232, 1995. 

S. Heinrich, E. Novak, G. Wasilkowski and H. Wozniakowski, The in- 
verse of the star-discrepancy depends linearly on the dimension, Acta 
Arith., 96, 279-302, 2001. 

P. Leopardi, Diameter bounds for equal area partitions of the unit sphere, 
Electron. Trans. Numer. Anal., 35, 1-16, 2009. 

P. L'Ecuyer, C. Lecot and B. Tuffin, A randomized quasi-Monte Carlo 
simulation method for Markov chains, Oper. Res., 56, 958-975, 2008. 

C. Lemieux and P. Sidorsky, Exact sampling with highly uniform point 
sets, Math. Comput. Modelling, 43, 339-349, 2006. 

L. Liao, Variance reduction in Gibbs sampler using quasi random num- 
bers, J. Comput. Graph. Statist., 7, 253-266, 1998. 

P. Mathe and E. Novak, Simple Monte Carlo and the Metropolis algo- 
rithm, J. Complexity, 23, 673-696, 2007. 

K. Mengersen and R. Tweedie, Rates of Convergence of the Hastings 
and Metropolis Algorithms, Ann. Statist., 24, 101-121, 1996. 

S. Meyn and R. Tweedie, Markov chains and stochastic stability, second 
ed., Cambridge University Press, 2009. 

B. Miasojedow, Hoeffding's inequalities for geometrically ergodic 
Markov chains on general state space, Preprint, Available at 
http://arxiv.org/abs/1201.2265. 



23 



[22] A. Mira and L. Tierney, Efficiency and convergence properties of slice 
samplers, Scand. J. Statist., 29, 1-12, 2002. 

[23] R. Neal, Slice sampling, Ann. Statist., 31, 705-767, 2003. 

[24] D. Paulin, Concentration inequalities for Markov chains by Marton cou- 
plings, Preprint, Available at http://arxiv.org/abs/1212.2015. 

[25] A. Owen and S. Tribble, A quasi-Monte Carlo Metropolis algorithm, 
Proc. Natl. Acad. Sci. USA, 102, 2005. 

[26] C. Robert and G. Casella, Monte Carlo Statistical Methods, second ed., 
Springer, New York, 2004. 

[27] G. Roberts and J. Rosenthal, Geometric ergodicity and hybrid Markov 
chains, Electron. Comm. Probab., 2, 13-25, 1997. 

[28] G. Roberts and J. Rosenthal, On convergence rates of Gibbs samplers 
for uniform distributions, Ann. Appl. Probab., 8, 1291-1302, 1998. 

[29] G. Roberts and J. Rosenthal, General state space Markov chains and 
MCMC algorithms, Prob. Surv., 1, 20-71, 2004. 

[30] D. Rudolf, Explicit error bounds for Markov chain Monte Carlo, Disser- 
tationes Math., 485, 93 pp., 2012. 

[31] R. Smith, Efficient Monte Carlo Procedures for Generating Points Uni- 
formly Distributed over Bounded Regions, Oper. Res., 32, 1296-1308, 
1984. 

[32] I. Sobol, Pseudo-random numbers for constructing discrete Markov 
chains by the Monte Carlo method, USSR Compat. Math. Math. Phys., 
14, 36-45, 1974. 

[33] M. Talagrand, Sharper bounds for Gaussian and empirical processes, 
Ann. Probab., 22, 28-76, 1994. 

[34] S. Tribble, Markov chain Monte Carlo algorithms using completely uni- 
formly distributed driving sequences, Ph.D. thesis, Stanford University, 
2007. 

[35] S. Tribble and A. Owen, Construction of weakly CUD sequences for 
MCMC sampling, Electron. J. Stat., 2, 634-660, 2008. 



24 



Appendix 



A Delta-covers 

We need a deep result of the theory of empirical processes which follows 
from Talagrand [331 Theorem 6.6] and Haussler [121 Corollary 1]. For a more 
general version see also [131 Theorem 4]. 

Proposition 3 There exists an absolute constant c > such that for each 
cumulative distribution function L on (G, 13(G)) the following holds: For all 
r G N there exist y±, . . . , y r G G with 



sup 



L(x) 



1 r 

i=l 



e(-oo,z) G 



In this subsection we study 5-covers in G with respect to the probability 
measure tt. 

Lemma 4 Let G C R d and Zei (G, 13(G), n) be a probability space where 
13(G) is the Borel a-algebra of G. Define the set srf C 13(G) of test sets by 

srf = {(_oo )X ) G ; x e M. d }. 
Then for any 5 > there exists a 5 -cover r$ of with 

|r*| < (3 + 4c 2 rfr 2 ) d , 

where c > is an absolute constant. 

Proof. Let 5 > be given and let r G N be the smallest integer such that 
2c\fdr~ 1 / 2 < 5. By Proposition [3] there are points y x , . . . ,y r G G such that 



sup 



1 r 

n((-oo,x) G ) - -J^ly, 



G(-oo,x) G 



i=l 



5 



(12) 



Let Wj = (?7i ; i, . . . , r]i t d). We now define the set 

d 
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The cardinality of T$ satisfies 

|r 5 | = {2 + r) d < (3 + 4c 2 d5~ 2 ) d . 

It remains to show that T$ is a 5-cover of stf '. 

Let z G M. d be arbitrary. Then there exist (—00, x)g, (—00, y)a G Tg such 
that 

(-oo,x] G C (-oo,jz) g C (-oo,j/) g 

and 

(-00, x] G n {yi, . . . , y r } = (-00, z) G n {j/i, . . . , y r } 

= (-00, y) G n {2/1, . . . ,y r }. 

Using ()12p we obtain 

f((-oo, j/)g \ (-oo,:t)g) 
1 r 

< tt((-oo, y) G ) - - XI 1 we(-oo,i/) Q 

i=l 

< s. 

Thus r<5 is a 5-cover. □ 



1 r 

7t((-00,x]g) - -X 1 ^ 



G(-oo,x] G 



i=l 



B Integration error 

In Appendix |A] we considered test sets which are intersections of boxes with 
the state space G. We define a reproducing kernel Q by 

Q(x,y) = l+ / l(_ 00i ,) (x)l(_ 00ja ) G (j/)p(dz), 
Vg 

where p is a measure on G with J G p(d2;) < 00. The function Q is symmetric 
Q(a;, y) = Q(y, x) and positive semi-definite, that is, for any X\, . . . , x n G G 
and complex numbers b\, . . . , b n G C we have 

rt 

X hb e Q(x k ,x e ) = 

k,l=l 



5> 

k=l 



+ 



G 



E 

fe=i 



&jfcl(-oo,z) G 



Ofc) 



p(dz) > 0, 



2(3 



where b e denotes the complex conjugate of b e . Thus Q uniquely defines a 
reproducing kernel Hilbert space H 2 = H 2 (Q) of functions defined on G. 
See [2] for more information on reproducing kernels and reproducing kernel 
Hilbert spaces. In fact, the functions / in H 2 permit the representation 

f(x)=f + I l(-oo, 2 ) G (z)/(z)p(d20, (13) 

Jg 

for some / G C and function / G L 2 (G, p), which can for instance be shown 
using the same arguments as in [U Appendix A] . The inner product in H 2 is 
given by 

(f,9) = fo9o+ / f(z)g(z)p(dz). 
Jg 

With these definitions we have the reproducing property 

(f,Q(',v)) = fo+ ! f(z)l ( - oc , z)G (y)p(dz) = f(y). 
Jg 

For 1 < q < oo we also define the space H q of functions of the form ([TBI 
for which / G L q (G,p), with norm 

ll/lk= (\M q + J\f{z)Mdz) 

We provide a simple example. 
Example 4 Let G = [0, 1] and let p be the Lebesgue measure, then 

Q(x,y) = l+ / l[ ^)(x)l[ 0j *)(i/)dz=H-iiiin{l-a:,l-y}. 
Jo 

The function f = —f, where f is the usual derivative of f , and (TlBl is then 

/(*) = fo + Jo Mo,4 x )f(z) dz = fo~ St f'(z) d * = fo + m - /(!)• Thus 
/(l) = /o and H q is the space of all absolutely continuous functions f for 

which f G L q ([0, l],p). 

We have the following result concerning the integration error in H q . 
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Theorem 3 Let G C M. d and it be a probability measure on G. Further let 
srf = {(— oo, x)g ■ x G G}. We assume that 1 < p, q < oo with 1/p + l/q = 1. 
Then for P n = {x%, x 2 , ■ ■ ■ , x n } C G and for all f G H q we have 



r i - 

/ f(zMdz)--J2f& 

JU i=l 



< ll/lk^W^), 



where 



D* pX J p n) = {j Q J Q U-°o, z)G (yMdy) - \ 1 

and for p = oo let 



(— oo, z)q (.Xi 



i=l 



P \ 1 /P 

p(dz) 



D*^(P n ) := D^ Xj7r (P n ) = sup 



„ 1 n 

/ l(_oo,*) (2/)7r(d2/) - - V l(-oo,z) G (xi) 



Proof. Let 



„ . n 

e(/,Pn)= / /(*)*(<!*) --^/(x,) 



denote the quadrature error when approximating the integral J G f(z)n(dz) 
b y \ Ya=i f( x i) where P n = {x 1 ,x 2 ,..., x n }. 

Let = J G Q(x,y)ir(dy) - ^ J]™=i Q( x > x i)> tnen we have 



•(-oo,z) G W I / -"-(-00,2)0 
G \JG 



h(x 

and therefore h G H p for any 1 < p < 00. Let 

1 



1 n \ 

(y)Tr(dy) - - ^ l(-oo,*) G (^) p(ds) 
ra i=i / 



~ f 1 

M^) = / 1 (-oo,z) (j/Mdj/)--y]l(-oo I *) (a?i). 

Further, for f E H q we have / G L q (G,p) and thus 



e(/,P„)= / /(z)/i(2;)p(dz) 

' G 
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Using Holder's inequality we have 



\e(f,P n )\ < / f(z) h(z) p(dz) 

JG 



< 



G 



/(*) p(dz) 



h(z) p{dz] 



i/p 



where 1 < p, q < oo are Holder conjugates 1/p + 1/q = 1, with the obvious 
modifications for p, q = oo. Thus the result follows. □ 

Thus we can use the bounds from the theorems above to obtain a bound 
on the integration error \e(f,P n )\, where P n is the set of points from the 
Markov chain, for functions / with representation ( Tl"3]) and W/Wh-l < oo. 



C Proof of Corollary M 

We use Theorem [2] where 7r is the normalized Lebesgue surface measure on 
the sphere § d . Let ip : [0, l] d —> E> d be an area-preserving mapping from 
[0, l] d to E> d (i.e., a generator function), see [9], and let the update function 
up : S d x [0, l} d -> E> d be given by 

cp(x, u) = ip(u). 

The transition kernel is given by K(x, A) = ir(A) which is uniformly ergodic 
with (a, M) for a = and M = 1. 

In order to obtain a bound on the spherical cap discrepancy using Theo- 
rem |2j it remains to construct a 5-cover on E> d of suitable size. We construct 
a 5-cover T$ by specifying a set of centers and heights in the following. 

Lemma 5 LetE> d C M. d+l denote the d- dimensional sphere. Let^ = {C(x,t) : 
xG§ d , — 1 < t < 1} denote the set of spherical caps of E> d . Then for any 
5 > there exists a 5-cover T$ of with respect to the normalized surface 
Lebesgue measure on E d with \Tg\ < cd d+1 5~ 2 ^ d+l \ where c > is a constant 
independent of d and S. 

The result of Corollary |6] follows now from Theorem |2] and Lemma [5] by 
setting 5 = d^^n" 1 ! 2 . The remainder of this subsection is concerned with 
the proof of Lemma 
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Let yi, y 2 , ■ ■ ■ , Vn £ §> d be given such that 



sup min \\x - yi\\ 2 < c d N 1/d , (14) 

where c d > is a constant depending only on d. The existence of such point 
sets follows, for instance, from [2]. Therein an equal area partition of E> d 
into N parts was shown with diameter bounded by CdN~ l / d . Thus by taking 
one point in each partition we obtain (ITS]) . Indeed, from the proof of [I] 
Theorem 2.6] we obtain that the constant c d can be chosen as 

i/d 

I Ll\/ II J. {Ui/ A) \ 

Cd 



8 ^ 8dVV/^ < 8 ■ 3^ < 21, 



where = t x ~ 1 e~ t dt denotes the Gamma function. For x,y G S d we 
have \\x — y\\\ = 2(1 — (x, y)). 

Let v = (x, y). Then we obtain the following result. 

Lemma 6 We have C(x,t) C C(y,u) if and only if v = (x,y) > u and 

t 2 + u 2 + v 2 - 2tuv > 1. (15) 

Proof. The condition v > u ensures that x G C(y,u). Let z G C(x,t), that is, 
(z, x) > t. Then z G C(y, u) if and only if (z, y) > u. The point z is furthest 
from y (as measured by the Euclidean distance) if it lies on the great circle 
containing x and y. Assuming that x, y, z all lie on the same great circle such 
that x is between y and z, we have 

M r. • f ■ \\ x ~ y\\ ■ \\ x — Z \\ \ 

\\y — z\\ =2 sin arcsm h arcsm 

" y 11 V 2 2 / 

=||a; — y\\ ^/l — \\x — z\\ 2 / 4 + \\x — z\\ yl — ||x — y|| 2 /4. 

The result now follows by using ||x — y|| 2 = 2(1 — v) and ||x — z\\ 2 < 2(1 — t). 
□ 



The next lemma gives us a 5-cover of C with respect to tt. 
Lemma 7 Let 

N -- 



35 d d 



B 2d (l;d/2, l/2)5 2d 

Lei M = LiV 1 ^/^ and T = { — 1 + fc/M : fc = 0, 1, . . . , 2M}. Then the set 

T s = {C( yi ,t):l<i<N,teT} 
is a 5 -cover of C with respect to ir. 



30 



Proof. Let C(x,t) G C be an arbitrary spherical cap. Let tji be such that 
\\x - Vih < C d N~ 1 l d < l/M, thus 



Let u, w G T be such that u + 2/M <t<w- 2/M and w - u < 5/M. 

We now show that for this choice we have C(x, t) C C(yi,u). First assume 
that u > 0. Then using (JT3]) with u > 1 - 1/(2M 2 ) and t - u > 2/M we 



Thus C(x,t) C C(yi,u). We now show that C(yi,w) C C(x,t). If w — 1 
we have C(yi,w) = in which case the result holds trivially. Thus we can 
assume that £ < 1 — 2/M, which implies that 7/j G C(x, £). We use ( fl5|) again 
with w > 1 - 1/(2M 2 ) and |u - i| > l/M. Thus the result follows by the 
same arguments as in the previous case. 

Thus we have C(yi,w) C C(x,t) C C(yi,u) with w — u < 5/M. For 
w,u>0we have 



obtain 



t 2 + n 2 + w 2 - 2tuv >t 2 - 2tu + u 2 + v 2 

>{t-u) 2 + {1-1/(2 M 2 )) 2 
>4/M 2 + 1 - l/M 2 + 1/(4M 4 ) > I. 



Now assume that u < 0. Then 



t 2 + u 2 + v 2 - 2tuv >t 2 - 2tu + u 2 + v 2 - tu/M 2 

>(t - u) 2 + (1 - 1/(2M 2 )) 2 - l/M 2 
>4/M 2 + 1 - 1/Af 2 + 1/(4M 4 ) - l/M 2 > 1. 



2B(l;d/2,l/2)Tr(C(y h u)\C( yi ,w)) 
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<v/lO/M + 25/M 2 < y/35/M. 
Thus, in general we have 

Tt{C{y h u)\C{y h w))< 



< 



35 



S(l;d/2,l/2) ~ 5(l;rf/2,l/2)A^ 1 /(2rf)- 
The last expression is bounded by 5 for 



iV 



35 d c^ 



B 2d (l;d/2, l/2)6 2d 



□ 



Lemma 8 Assume that N 1 ^ /a > 1/2 (otherwise (1141) is trivial). Then we 
have for 5 = d l l 2 n~ l l 2 that there exist an absolute constant c > 0, stzc/i £/ia£ 



d+1 



Proof. We have 



ir^l < cn 



|r 5 | <N(2M + 1) 

<N(2N^ d /c d +l) 



<4Ari+i/ 



< 



7Cd 

8 ■ 35 d+1 c^ 



: fl2(«J+l)( 1;d / 2j 1/2)52(^1)' 

Thus for 5 = (i 1 / 2 ?^" 1 / 2 we obtain 

280 ■ 735 d (r((d + 1) /2)) 2 < d+1 ) 



< 



(f^ 1 BW(l;(i/2 ) l/2)-tf i + 1 (r(d/2)r(l/2)) 2 ( rf + 1 ) 

where T is the Gamma function. Using Stirling's formula for the Gamma 
function 



we obtain that there is an absolute constant c > such that \Fx\ < cn d+1 . □ 
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